1 results
13 - On-line Learning from Finite Training Sets
-
- By David Barber, Department of Medical Biophysics, University of Nijmegen, 6525 EZ Nijmegen, The Netherlands, Peter Sollich, Department of Physics, University of Edinburgh, Edinburgh EH9 3JZ, U.K.
- Edited by David Saad, Aston University
-
- Book:
- On-Line Learning in Neural Networks
- Published online:
- 28 January 2010
- Print publication:
- 28 January 1999, pp 279-302
-
- Chapter
- Export citation
-
Summary
Abstract
We analyse online gradient descent learning from finite training sets at non-infinitesimal learning rates η for both linear and non-linear networks. In the linear case, exact results are obtained for the time-dependent generalization error of networks with a large number of weights N, trained on p = αN examples. This allows us to study in detail the effects of finite training set size α on, for example, the optimal choice of learning rate η. We also compare online and offline learning, for respective optimal settings of η at given final learning time. Online learning turns out to be much more robust to input bias and actually outperforms offline learning when such bias is present; for unbiased inputs, online and offline learning perform almost equally well. Our analysis of online learning for non-linear networks (namely, soft-committee machines), advances the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.
Introduction
The analysis of online (gradient descent) learning, which is one of the most common approaches to supervised learning found in the neural networks community, has recently been the focus of much attention. The characteristic feature of online learning is that the weights of a network (‘student’) are updated each time a new training example is presented, such that the error on this example is reduced.